Positioning Unknown Words in a Thesaurus by Using Information Extracted from a Corpus
نویسنده
چکیده
This p~q)er describes a. method for positio,ing unknown words in an existing thesa,rus by using word-to-word rela.tionships with relation (case) markers extracted from a large corpus. A suitable area (if the thesaurus for an unknown woM ix estimated l)y integrating the human intuition I)urled in the thesaurus and statistical data extracted from the corpus. To overcome the prohlem of data sparseness, distinguishing features of each node, called "viewpoints" are. extracted a.utomatically and used to calcMa.te the similarity between the unknown woM and a. word in the thesaurus. The results of a.tl experiment confirm the COrltril)ution of viewl)oints to the I)ositioning task.
منابع مشابه
ارائه روشی برای استخراج کلمات کلیدی و وزندهی کلمات برای بهبود طبقهبندی متون فارسی
Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...
متن کاملKnowledge Acquisition: Classification of Terms in a Thesaurus from a Corpus
! Faced with growing volume and accessibility of electronic textual information, information retrieval, and, in general, automatic documentation require updated terminological resources that are ever more voluminous. A current problem is the automated construction of these resources (e.g., terminologies, thesauri, glossaries, etd~ ~) from a corpus. Various linguistic and statistical methods to ...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملارائه روشی جدید برای شاخصگذاری خودکار و استخراج کلمات کلیدی برای بازیابی اطلاعات و خوشهبندی متون
Persian words in writing with a diverse and cover all modes of grammatical words with the recruitment of a series of specific rules because it is impossible to extract keywords automatically from Persian texts difficult and complex. This thesis has attempted to use linguistic information and thesaurus, keywords Mnatry be provided. Using the symbol system is structured network can be keywords, i...
متن کاملWord Usage : Newspaper Text versus the Web
This paper explores the differences in words and word usage in two corpora – one derived from newspaper text and the other from the web. A corpus of web pages is compiled from a controlled traversal of the web, producing a topicdiverse collection of 2 billion words of web text1. We compare this Web Corpus with the Gigaword Corpus, a 2 billion word collection of news articles. The Web Corpus is ...
متن کامل